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Garth  P.  McCormick 


1.  Introduction 


The  general  mathematical  programming  (optimization)  problem  can  be 

stated  in  the  following  form:  find  values  (xn,...,x  ) which  solve 

1 n 

minimize  f(x) 


subject  to  the  restrictions  that  (1) 

gi(x)  > 0 , for  i=l,...,m  , 

and 

hj(x)  = 0 , for  j=l,. . . ,p  . 

What  is  meant  by  a "large"  optimization  problem  depends  upon  the  nature 
of  the  functions  f , {g^}  , and  {h  } . If  they  are  linear  functions, 

it  is  usually  the  case  that  the  restrictions  x,  > 0 for  j=l,...,n  are 

included  in  the  inequality  constraints  and  the  problem  is  considered 
large  when  (ra+p-n)  is  more  than  2,000.  For  linear  programming  problems, 
the  size  of  n is,  for  all  practical  purposes,  unlimited.  When  the  func- 
tions are  nonlinear,  the  size  of  n becomes  important,  and  a large  prob- 
lem can  be  one  where  n is  above  50.  An  exact  definition  is  hard  to 
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give  because  the  difficulty  in  solving  a general  nonlinear  optimization 
problem  has  as  much  to  do  with  the  nature  of  the  functions  involved  as 
it  does  with  the  dimensions  of  the  problems.  It  is  best  first  to  des- 
cribe situations  which  give  rise  to  large  nonlinear  programming  problems 
and  analyze  their  algebraic  structure.  This  is  done  in  Section  2.  In 
Section  3 there  is  an  overview  of  current  computer  coded  algorithms  for 
solving  large  nonlinear  programs.  In  Section  4 is  some  speculation  on 
future  research  needed  in  this  area  with  emphasis  on  a new  approach 
called  factorable  programming. 


2.  Large  Nonlinear  Programming  Models 


A major  source  of  large  nonlinear  (and  linear  also)  optimization 
problems  comes  from  the  general  area  known  as  "resource  allocation." 


The  variables  of  the  problem  are  usually  written  as  {x^}  * where  k=l. 


...,p  , and  <1=1,..., q . Here  p is  the  number  of  resources  available, 
and  q is  the  number  of  tasks  or  missions  which  can  utilize  the  resources. 


The  functions  f , {g^}  , {h ^ } are  usually  cost  functions,  effectiveness 


functions,  and  functions  which  represent  constraints  on  the  resources. 


The  number  of  nontrivial  constraints  is  usually  of  the  order  of  (p+q)  , 
but  the  problem  can  become  "large"  quickly  because  the  number  of  variables 
is  equal  to  p*q  . The  following  [Bracken  and  McCormick  1968]  is  a simple 
example  in  the  area  of  weapons  allocation  which  illustrates  this  class  of 


problems. 


Variables. 


number  of  weapons  of  type  k to  be  allocated  to  target  £ , 
for  k=l,...,p;  £■!,..., q . 


Fixed  inputs 


(for  k=l,...,p)  total  weapons  of  type  k available 


the  probability  that  target  £ will  be  undamaged  by  an 
attack  of  one  of  weapon  type  k 


the  military  value  of  target  £ . 
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The  optimization  problem  is  then 

maximize 

{xu} 

subject  to  ^=1  x^  < ak  , for  k=l,...,p 

and  Xj.  > 0 , for  k=l,...,p;  £“l,...,q  . 

Problems  of  this  type  typically  have  a special  structure.  In  this 
case  the  constraints  are  linear  and  "sparse."  The  density  of  the  matrix 
of  nontrivial  linear  constraints  is  1/p  with  only  the  number  "1."  defin- 
ing the  matrix.  The  constraints  are  all  of  the  form  of  Generalized  Upper 
Bounds  (GUB).  Algorithms  for  solving  linear  problems  have  been  able  re- 
cently to  take  advantage  of  this. 

The  concept  of  sparseness  for  nonlinearly  constrained  problems 
has  not  been  developed  in  general.  This  example  will  be  examined  further 
when  it  is  shown  how  to  use  linear  programming  to  solve  separable  optimi- 
zation problems. 

Another  important  area  of  nonlinear  programming  is  that  of  optimal 
structural  design.  This  is  an  area  where  the  ability  to  generate  large 
difficult  problems  has  outstripped  the  ability  of  algorithmists  to  solve 
them.  A typical  problem  takes  the  following  form.  Let  ot  be  a vector  of 
design  variables,  and  x a vector  of  state  variables.  The  usual  objective 
function  in  design  problems  is  weight  (to  be  minimized).  The  design  vari- 
ables are  often  cross-sectional  areas  of  the  members  of  the  structure.  The 
constraints  usually  involve  compatibility,  equilibrium,  and  force  deforma- 
tion. 

Variables. 

a = design  variables 

= state  variables  (in  one  formulation  these  are  eliminated 
from  the  problem) 


nq  f p 

^•£=1  ug,|^  “ ^k=l  (expected  target  damage  value) 
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Fixed  inputs  and  derived  quantities. 

S(a)  = a positive  definite  matrix  which  depends  on  the  design 
variables 

A = a matrix  which  depends  upon  the  geometry  of  the  problem 

M = mass  matrix 

P = vector  of  applied  forces 

a = allowable  stress  for  material  used 

A = vector  of  allowable  deflections  for  state  variables 
T 

K(a)  = A S(a)A  , the  stiffness  matrix 
= length  of  ith  member 
p = density  of  material. 


The  optimization  problem  is  then 

minimize  p £ L.a. 
(a,x)  i 1 1 

subject  to  x < A 

K(a)x  = P 


(Weight) 

(Deflection  Constraints) 
(Stiffness  Constraints) 


S(a)Ax  < ad  (Stress  Constraints) 

Another  interesting  complication  occurs  if  a buckling  constraint  is  im- 
posed. This  takes  the  form  i (some  specified  force)  , where 

Amin(oO  is  the  smallest  eigenvalue  of  the  generalized  eigenvalue  problem 

K(a)y  = My  A . 


For  large  structures,  the  number  of  variables  and  constraints  be- 
comes quite  large.  An  alternative  formulation  which  reduces  the  number 

of  variables  is  to  solve  for  x = K(a)  ^P  and  substitute  this  wherever 
x appears.  There  are  difficulties  in  obtaining  derivatives  for  this 
implicitly  defined  problem,  but  it  can  be  done  with  some  effort  and 
thought. 


- 4 - 
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Use  of  finite  element  techniques  for  shell  structures  is  a 
source  of  large  nonlinear  optimization  problems. 

There  is  an  enormous  literature  on  structural  design.  A sample 
of  material  in  the  area  is  contained  in  [Haug  1973]  and  [Cohn  and  Maier 
1978]. 


The  final  example  of  large  nonlinear  optimization  problems  comes 
from  the  logistics  area,  inventory  control.  In  [Schrady  and  Choe  1971] 
the  following  problem  was  formulated. 


Variables . 


Q^,r^  for  i=l,...,N  , the  amount  to  order  and  the  reorder 
point  of  the  ith  inventory  item. 


Constants  and  fixed  inputs. 


c.  = item  unit  cost 

l 

A^  = mean  demand  per  unit  time  (in  units) 

= maximum  amount  of  money  allowed  to  be  tied  up  in  in- 
ventory 

1<2  = reorder  workload  limit  allowed  to  be  tied  up  in  in- 

ventory 

fL , CL  = mean  and  standard  deviation  of  lead  time  demand  (which 
is  assumed  normally  distributed)  for  the  ith  item. 

Let  <Mx)  denote  the  normal  density  function  with  mean  0,  standard 

CO 

deviation  1.,  and  let  <I>(z,  = / <J>(x)dx  . The  optimization  problem  which 
produces  the  optimal  order  quantities  and  reorder  points  is 

{qn!r'r  f = ^ A D°i + rri"yi)2!  ■ ai(rryi)  <t,(J~)] 

(expected  time-weighted  shortages) 
vN 

subject  to  g^  = I<2  - Ai/Qi  > 0 (reorder  workload  constraint) 

cN 

g2  = “ ii=2  + Q^/2  - Pi)  > 0 (investment  constraint) 


Q.  > 0 , i«l,...,N  . 
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When  N , the  number  of  inventory  items,  is  large,  this  generates  a large 
nonlinear  programming  problem.  It  has  a special  structure  which  can  be 
utilized  to  solve  it.  This  is  discussed  later  in  the  paper  when  the  SUMT 
algorithm  is  applied  to  solve  the  problem. 

Any  nonlinear  programming  problem  which  has  a dynamic  character, 
i.e.,  whose  variables  are  time-dependent,  is  usually  large  since  the  num- 
ber of  variables  is  equal  to  the  product  of  the  "x's"  times  the  number  of 
time  periods.  Examples  of  this  are  problems  in  optimal  control,  the  cal- 
culus of  variations,  and  economic  planning  models.  More  sophisticated 
dynamic  versions  of  the  weapons  assignment  problem  with  elements  of  game 
theory  and  max-min  also  generate  large  problems.  See  [Tomlin  1978]  for 
a discussion  of  this.  These  problems  generally  have  a decomposable  struc- 
tu  e which  can  be  used  to  uncouple  the  problem.  Since  one  of  the  papers 
in  this  meeting  concerns  that  approach  it  will  not  be  discussed  further 
here. 

3.  Computer  Coded  Algorithms  for  Solving 
Large  Nonlinear  Programming  Problems 

There  are  direct  and  indirect  methods  for  solving  large  nonlinear 
optimization  problems.  The  indirect  ways  consist  usually  of  solving  a 
sequence  of  linear  or  linearly  constrained  problems  since  computer  pro- 
grams for  solving  these  problems  are  highly  developed.  The  simplest  way 
of  doing  this  was  proposed  in  [Griffith  and  Stewart  1961].  The  idea  is 
to  make  local  linear  approximations  to  the  objective  function  and  con- 
straints and  solve  the  resulting  linear  programming  problems. 

The  second  way  is  to  take  a general  nonlinear  optimization  prob- 
lem and  solve  it  using  a sequence  of  linearly  constrained  nonlinear  pro- 
gramming problems.  Recent  codes  for  solving  the  latter  are  [Lasdon  and 
Warren  1975]  and  [Murtagh  and  Saunders  1978].  Some  papers  relating  to 
the  theory  and  algorithmic  developments  for  converting  nonlinearly  con- 
strained problems  into  sequences  of  linearly  constrained  problems  are 
discussed  in  [Powell  1978]. 

The  third  approach  is  to  take  a nonlinear  programming  problem, 
convert  it  to  an  equivalent  separable  nonlinear  programming  problem  by 
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adding  equality  constraints  and  extra  variables,  then  solve  it  using  the 
separable  programming  capability  which  is  available  with  some  of  the  large- 
scale  linear  programming  systems  [Beale  1975].  Just  about  any  nonlinear 
programming  problem  can  be  "separated"  using  some  simple  tricks.  To  illus- 
trate this,  consider  the  weapons  assignment  problem  of  Section  1. 


Use  is  made  of  the  identity,  ax  = ex^nct 


First,  convert  the  data 


and  let  3^  = An  , for  k=l , and  £=l,...,q  . Introduce  the 

new  variables  {z^}  > and  form  the  equality  constraints  z^  = x^B^^  > 

for  &=!,..., q . The  equivalent  separable  optimization  problem  now  has  the 


maximize  Y!*-  u. [1  - e ] 


{5W’{Z* 


subject  to  the  restrictions  that 


\ * for  k=1’---’q  . = 5=i  • for  *=1”--’q  • 


= 0 ’ f°r  k=1-"->P; 


The  new  problem  has  pq  + q variables,  and  p + q nontrivial  linear  con- 
straints. 

Depending  upon  the  problem,  the  creation  of  an  equivalent  separable 
one  may  generate  a "large"  separable  problem  from  a moderate  sized  nonsep- 
arable  one.  For  discussion  of  a general  systematic  approach  for  separating 
nonlinear  programming  problems,  see  [McCormick  1972b]. 


Separable  problems  are  solved  by  the  large  linear  programming  systems 
using  the  device  of  approximating  the  nonlinear  functions  of  a single  vari- 
able by  piecewise  linear  functions.  There  are  many  approaches  for  doing 
this.  References  and  discussion  of  this  are  in  [Beale  1975]. 

A direct  approach  for  solving  large  nonlinear  programming  problems 
generally  consists  of  taking  advantage  of  the  special  structure  of  a par- 
ticular large  problem  and  modifying  the  linear  algebra  (matrix  techniques) 
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used  for  solving  the  problem  so  that  the  number  of  operations  (multiplica- 
tions and  additions)  is  reduced,  as  well  as  obtaining  a reduction  in  stor- 
age. To  illustrate  this,  consider  the  penalty  function  approach  to  the  in- 
ventory problem  of  Section  2.  Let  x be  the  vector  of  variables  {Q^,r^} 

The  penalty  function  approach  is  to  find  the  unconstrained  minimizer 
of 

pk  = f - rkLn(gl)  - rk*n(g2)  - r^nQ.  , 

for  a decreasing  sequence  of  values  {rk}  which  tend  to  zero.  The  uncon- 
strained minimizers  tend  to  the  constrained  solution.  The  fastest  method 
for  finding  the  unconstrained  minimizer  is  the  generalized  Newton's  method 
which  takes  the  form 

XK.+1  *1  " VxxMxJ  VPk(x£)t;Jl  ’ (2) 

(where  t^  is  the  step-size  scalar).  The  traditional  way  of  solving  the 

above  equations  is  to  form  the  nxn  Hessian  matrix  and  use  Cholesky  de- 
composition to  find  the  desired  matrix  equation  solution.  The  storage  re- 

2 3 

quired  is  n memory  locations  and  the  order  of  operations  is  n /3  . 

Thus  for  any  reasonable  number  of  inventory  items  (recall  n = 2N  ) this 

approach  is  computationally  prohibitive.  Like  all  large  problems,  there 

is  a special  structure  and  matrix  inversion  techniques  can  be  modified  to 

facilitate  the  solution  of  the  problem. 


The  natural  derivation  of  the  Hessian  matrix  is  depicted  in  Figure  1. 
If  it  were  formed  explicitly  it  would  require  4N2  memory  locations.  Kept 
in  its  implicit  form  it  requires  only  ION  locations  (actually  most  of  the 
numbers  are  already  available  and  need  not  be  stored  separately).  If  N 
were  1,000  this  is  the  difference  between  4,000,000  and  10,000. 


Computational  efficiency  is  obtained  by  inverting 


V2  P 
xx  k 


using 


the  Woodbury-Morrison-Sherman  formula  recursively,  i.e.,  the  inverse  of 
a matrix  perturbed  by  an  outerproduct  matrix  (or  dyad)  is  the  inverse  of 
the  original  matrix  perturbed  by  a dyad.  Specifically, 


A-1  - Avtc-1  + vTA-1v]' 


1 T -1 
v A 


! 


- 8 - 


[a  + VCVT]-1 


(3) 
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where  A is  n*n  , c is  a scalar,  and  v is  an  n*l  vector.  Applying 
this  to  (2),  first  A+B  is  formed.  The  inverse  of  (A+B)  is  gotten  by 
inverting  the  2x2  blocks.  This  computation  requires  2N  multiplications. 
The  storage  of  this  can  be  done  in  4N2  (if  symmetry  is  not  used)  locations. 

Next,  two  applications  of  the  formula  are  given.  The  computation  of  (A+B)  ^c 
requires  4N  multiplications  and  the  result  can  be  stored  in  2N+1  loca- 
tions. The  second  application  of  (3)  yields  the  desired  matrix  inverse  in 
implicit  form.  The  total  computations  required  are  about  16N  multiplica- 
tions and  the  inverse  can  be  stored  in  7N  memory  locations.  This  is  to  be 
compared  with  8N3/3  multiplications  required  for  the  Cholesky  method,  and 
4N2  memory  locations.  Further  details  of  this  are  in  [McCormick  1972a]. 

The  point  is  that  most  large  nonlinear  programming  problems  have  a 
special  structure,  and  algorithms  for  solving  them  can  modify  their  matrix 
techniques  to  reduce  the  number  of  operations  and  computer  storage.  For  any 
particular  problem,  the  required  modifications  can  be  figured  out.  The 
question  is,  is  there  a general  structure  which  occurs  in  all  problems  for 
which  modifications  to  handle  larger  problems  can  be  made.  This  point  is 
taken  up  in  Section  4. 

Direct  methods  for  solving  nonlinear  optimization  problems  do  not 
handle  problems  of  very  large  size.  For  a survey  of  software  available  and 
the  size  of  problems  handled  see  [Sandgren  1977],  [Waren  and  Lasdon  1977], 
and  [Wright  1978]. 

4.  A General  Approach  to  Large  Nonlinear 
Programming  Problems 

The  Lagrangian  function  associated  with  Problem  (1)  is 
L(x,u,w)  = f(x)  - £u.  g4(x)  + £w^  h^(x)  • 

Solving  the  problem  can  be  thought  of  as  trying  to  find  the  Karush-Kuhn- 
Tucker  multipliers  {u^}  , and  the  Lagrange  multipliers  {w^ } associated 

with  a point  x solving  the  equations 

VxL(x,u,w)  = 0 , 

^igi(x)  = 0 , for  i=l,...,m 

h.(x)  = 0 , j=l,...,p  . 

- 10  - 
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A modified  form  of  Newton's  method  for  solving  these  equations  requires  the 
inverse  of  the  matrix 

VxxLK’uk,W^ 

N(xk)T  0 

where  N (x^)  is  the  matrix  of  gradients  of  the  active  constraints  and 
equality  constraints  at  . The  full  matrix  inverse  need  not  be  obtained, 

just  , a matrix  generating  the  null  space  of  N (x^)  , and  the  inverse 


T 2 r k. 

(explicitly  or  implicitly  of  the  projected  Hessian),  V^L  , u ,w  J 

The  ability  of  nonlinear  programming  algorithms  to  solve  large  problems  is 
tied  up  with  the  ability  to  compute  efficiently  these  quantities. 

For  large,  sparse  nonlinear  programming  problems,  the  techniques  for 
computing  S (x^)  are  in  most  part  available  from  the  work  done  in  the  past 

on  linear  programming  problems.  There  is  currently  much  work  in  alternative 
ways  of  computing  and  updating  the  matrix  giving  the  null  space  for  linear 

equations.  Effort  needs  to  be  expended  in  computing  H^1  . As  in  the  in- 
ventory problem,  there  is  no  need  to  form  explicitly.  From  the  theory 

of  factorable  programming  it  turns  out  that  V^L^x^.u  ,w  J is  given  auto- 
matically as  the  sum  of  a diagonal  matrix  and  outer  product  matrices.  One 
way  of  utilizing  this  to  obtain  the  inverse  was  shown  in  the  inventory  prob- 
lem. There  are  many  other  linear  algebra  approaches  for  solving  this  prob- 
lem, which  can  take  advantage  of  the  sparseness  of  the  vectors  defining  the 
outer  product  form  and  the  diagonal  matrix.  For  more  details  see  [McCormick 
1978]. 
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